This report demonstrates basic exploration of COVID-19 data, with the focus on computing and graphing data on relative timelines, in which the “day zero” is unique for each country.
After this tutorial, the reader should be able to:
plotlyIn this tutorial we will have three goals. First, we would like to have a tool to plot multiple trajectories onto the same canvas and explore the differences between trajectories plotted on the same scale. Second, we would like to create a unique temporal context for each trajectory and express it with respect meaningful anchors, such as first detected case or first confirmed death. Thirdly, we would like to better understand the sequence of these temporal anchors to have a better understanding of how the pandemic unfolded.
Here are the graphs we will be creating in this tutorial:
Non-technical readers are welcome to skip this section.
library(magrittr) #Pipes
library(ggplot2) #For graphing
library(dplyr) # for shorter function names. but still prefer dplyr:: stems
library(lubridate) # for working with dates
library(plotly) # interactive graphs
config <- config::get()
# to be applied to every graph we will make
ggplot2::theme_set(
ggplot2::theme_bw(
)+
theme(
strip.background = element_rect(fill="grey90", color = NA)
)
)
# important dates we will refer to in analysis
date_of_exodus <- lubridate::date("2020-01-13") # first case outside of China
date_of_pandemic <- lubridate::date("2020-03-11") # WHO declares pandemic
# to help us focus on a manageable set of countries for purposes of demonstration
oecd_countries <- c(
"AUS", "AUT", "BEL", "CAN", "CZE", "DNK", "EST", "FIN", "FRA",
"DEU", "GRC", "HUN", "ISL", "IRL", "ISR", "ITA", "JPN", "KOR",
"LVA", "LTU", "MEX", "NLD", "NZL", "NOR", "POL", "PRT", "SVK",
"SVN", "ESP", "SWE", "CHE", "TUR", "GBR", "USA", "RUS", "ZAF"
)
# adds neat styling to your knitr table
neat <- function(x, output_format = "html"){
# knitr.table.format = output_format
if(output_format == "pandoc"){
x_t <- knitr::kable(x, format = "pandoc")
}else{
x_t <- x %>%
# x %>%
# knitr::kable() %>%
knitr::kable(format=output_format) %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed","responsive"),
# bootstrap_options = c( "condensed"),
full_width = F,
position = "left"
)
}
return(x_t)
}
The data comes from European Centre for Disease Prevention and Control and can be downloaded here. I discuss the preparation of this data for analysis is a separate post.
# covid data
ds_covid <- readr::read_csv(config$path_input_covid)
ds_covid %>% glimpse()
Rows: 31,365
Columns: 8
$ date <date> 2019-12-31, 2020-01-01, 2020-01-02, 2020-01-03, 2020-01-04, 2020-01-05, 2020-01-06, 2020...
$ n_cases <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ n_deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ country_code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"...
$ n_population_2018 <dbl> 37172386, 37172386, 37172386, 37172386, 37172386, 37172386, 37172386, 37172386, 37172386,...
$ country_code2 <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF",...
$ country_label <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",...
$ continent_label <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "...
# note that only `date`, `n_cases`, and `n_deatsh` change with time
# other variables have the same value within each country
A few things to notice about this data set:
Only date , n_cases, and n_deaths change over time. All other variables have a single unique value for each country.
For some countries, the observations are missing for certain dates, for example:
ds_covid %>% filter(country_code == "FIN") %>% filter(date > as_date("2020-02-26"))
# A tibble: 95 x 8
date n_cases n_deaths country_code n_population_2018 country_code2 country_label continent_label
<date> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 2020-02-27 1 0 FIN 5518050 FI Finland Europe
2 2020-02-28 0 0 FIN 5518050 FI Finland Europe
3 2020-02-29 1 0 FIN 5518050 FI Finland Europe
4 2020-03-01 0 0 FIN 5518050 FI Finland Europe
5 2020-03-02 3 0 FIN 5518050 FI Finland Europe
6 2020-03-03 NA NA FIN 5518050 FI Finland Europe
7 2020-03-04 1 0 FIN 5518050 FI Finland Europe
8 2020-03-05 NA NA FIN 5518050 FI Finland Europe
9 2020-03-06 5 0 FIN 5518050 FI Finland Europe
10 2020-03-07 7 0 FIN 5518050 FI Finland Europe
# ... with 85 more rows
The data preparation step ensured that each country has the same number of rows, creating these missing cells. This is important for two reasons: A) we want to be able to differentiate between the absence of cases and the absence of reporting and B) missing dates will complicate comutation of relative timelines.
# to help us focus on a smaller subset of countries
oecd_countries <- c(
"AUS", "AUT", "BEL", "CAN", "CZE", "DNK", "EST", "FIN", "FRA",
"DEU", "GRC", "HUN", "ISL", "IRL", "ISR", "ITA", "JPN", "KOR",
"LVA", "LTU", "MEX", "NLD", "NZL", "NOR", "POL", "PRT", "SVK",
"SVN", "ESP", "SWE", "CHE", "TUR", "GBR", "USA", "RUS", "ZAF"
)
# to have a handy filter
ds_covid <- ds_covid %>%
mutate(
oecd = country_code %in% oecd_countries
)
We focus on 36 country members of the Organization for Economic Co-operation and Development (OECD) because they report more nuanced data on their economic and social development, so we can have a richer pool of explanatory variables (see http://stats.oecd.org/ )
Computing cumulative (running) sum is easily accomplished with cumsum function paired with group_by, however, watch out for NA values: they will break the running sum, resulting in the NA for the rest of the column:
ds_covid <- ds_covid %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(n_cases) # would produce NA after en
# n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
) %>%
ungroup()
ds_covid %>%
filter(country_code == "FIN") %>% filter(date > as_date("2020-02-26")) %>%
select(date, country_code, n_cases, n_cases_cum)
# A tibble: 95 x 4
date country_code n_cases n_cases_cum
<date> <chr> <dbl> <dbl>
1 2020-02-27 FIN 1 2
2 2020-02-28 FIN 0 2
3 2020-02-29 FIN 1 3
4 2020-03-01 FIN 0 3
5 2020-03-02 FIN 3 6
6 2020-03-03 FIN NA NA
7 2020-03-04 FIN 1 NA
8 2020-03-05 FIN NA NA
9 2020-03-06 FIN 5 NA
10 2020-03-07 FIN 7 NA
# ... with 85 more rows
To avoid this, we use convert NA to 0 on the fly with tidyr::replace_na() function:
ds_covid <- ds_covid %>%
group_by(country_code) %>%
mutate(
# n_cases_cum = cumsum(n_cases) # would produce NA after en
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
) %>%
ungroup()
ds_covid %>%
filter(country_code == "FIN") %>% filter(date > as_date("2020-02-26")) %>%
select(date, country_code, n_cases, n_cases_cum)
# A tibble: 95 x 4
date country_code n_cases n_cases_cum
<date> <chr> <dbl> <dbl>
1 2020-02-27 FIN 1 2
2 2020-02-28 FIN 0 2
3 2020-02-29 FIN 1 3
4 2020-03-01 FIN 0 3
5 2020-03-02 FIN 3 6
6 2020-03-03 FIN NA 6
7 2020-03-04 FIN 1 7
8 2020-03-05 FIN NA 7
9 2020-03-06 FIN 5 12
10 2020-03-07 FIN 7 19
# ... with 85 more rows
This option is better than converting NAs to zero during the data preparation step, as this would mask the absence of reporting, overwriting it with a definitive value of 0 cases.
Now, with the cumulative case available, we can plot the trajectory in a interactive graph with plotly package:
library(plotly)
g1 <- ds_covid %>%
filter(oecd) %>%
highlight_key(~country_label )%>%
plot_ly(
x = ~ date, y = ~ log(n_cases_cum)
) %>%
group_by(country_label) %>%
add_lines()
g1 %>%
highlight(
on = "plotly_click"
,selectize = TRUE
,dynamic = TRUE
,persistent = TRUE
)
Note that we add the highlight component with the plotly::highlight() function.
If you are familiar with ggplot2, the syntax of plotly should not be too confusing, however, it does take some time to get used to.
For more options and syntax guide, see https://plotly-r.com/client-side-linking.html. There is also a package for rendering ggplot2 into interactive graphs (ggplotly), which offers less flexibility in display design compared to creating graphs directly with plotly, but is much simpler to implement.
g1 <- ds_covid %>%
filter(oecd) %>%
ggplot(aes(x = date, y= log(n_cases_cum)))+
geom_line(aes(group = country_code))
g1 <- plotly::ggplotly(g1) %>% plotly::highlight(on = "plotly_hover")
g1
It is often makes sense to compare the progression of epidemics across countries using a meaningful “time zero”, for example the day of the first confirmed case or the first confirmed death in the country. To help us carry out the computation, let us construct a fictional example that we can use to develop the script
# create reproducible example (reprex) to test out your function
d_reprex <- tibble::tribble(
~country_code, ~date, ~n_cases,
"Alabnia", "2020-03-01", NA,
"Alabnia", "2020-03-02", 0,
"Alabnia", "2020-03-03", 1,
"Alabnia", "2020-03-04", 0,
"Alabnia", "2020-03-05", 3,
"Botswana", "2020-04-01", 0,
"Botswana", "2020-04-02", NA,
"Botswana", "2020-04-03", 2,
"Botswana", "2020-04-04", 3,
"Botswana", "2020-04-05", 0,
"Chile", "2020-05-01", 2,
"Chile", "2020-05-02", 0,
"Chile", "2020-05-03", 0,
"Chile", "2020-05-04", 3,
"Chile", "2020-05-05", 1,
) %>%
mutate(
date = lubridate::as_date(date)
)
d_reprex %>% neat()
| country_code | date | n_cases |
|---|---|---|
| Alabnia | 2020-03-01 | NA |
| Alabnia | 2020-03-02 | 0 |
| Alabnia | 2020-03-03 | 1 |
| Alabnia | 2020-03-04 | 0 |
| Alabnia | 2020-03-05 | 3 |
| Botswana | 2020-04-01 | 0 |
| Botswana | 2020-04-02 | NA |
| Botswana | 2020-04-03 | 2 |
| Botswana | 2020-04-04 | 3 |
| Botswana | 2020-04-05 | 0 |
| Chile | 2020-05-01 | 2 |
| Chile | 2020-05-02 | 0 |
| Chile | 2020-05-03 | 0 |
| Chile | 2020-05-04 | 3 |
| Chile | 2020-05-05 | 1 |
We already know how to compute cumulative counts
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
)
d_reprex_timeline
# A tibble: 15 x 4
# Groups: country_code [3]
country_code date n_cases n_cases_cum
<chr> <date> <dbl> <dbl>
1 Alabnia 2020-03-01 NA 0
2 Alabnia 2020-03-02 0 0
3 Alabnia 2020-03-03 1 1
4 Alabnia 2020-03-04 0 1
5 Alabnia 2020-03-05 3 4
6 Botswana 2020-04-01 0 0
7 Botswana 2020-04-02 NA 0
8 Botswana 2020-04-03 2 2
9 Botswana 2020-04-04 3 5
10 Botswana 2020-04-05 0 5
11 Chile 2020-05-01 2 2
12 Chile 2020-05-02 0 2
13 Chile 2020-05-03 0 2
14 Chile 2020-05-04 3 5
15 Chile 2020-05-05 1 6
Now we will use a simple logical test to create a logical variable indicating when the running total exceeded the value of the chosen threshold.
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
,onset_case = n_cases_cum > 0
)
d_reprex_timeline
# A tibble: 15 x 5
# Groups: country_code [3]
country_code date n_cases n_cases_cum onset_case
<chr> <date> <dbl> <dbl> <lgl>
1 Alabnia 2020-03-01 NA 0 FALSE
2 Alabnia 2020-03-02 0 0 FALSE
3 Alabnia 2020-03-03 1 1 TRUE
4 Alabnia 2020-03-04 0 1 TRUE
5 Alabnia 2020-03-05 3 4 TRUE
6 Botswana 2020-04-01 0 0 FALSE
7 Botswana 2020-04-02 NA 0 FALSE
8 Botswana 2020-04-03 2 2 TRUE
9 Botswana 2020-04-04 3 5 TRUE
10 Botswana 2020-04-05 0 5 TRUE
11 Chile 2020-05-01 2 2 TRUE
12 Chile 2020-05-02 0 2 TRUE
13 Chile 2020-05-03 0 2 TRUE
14 Chile 2020-05-04 3 5 TRUE
15 Chile 2020-05-05 1 6 TRUE
It may make sense to use other operationalization the “onset” event.. For example, we can define it as “the date of the 10th case” or “3rd death” or “.1% of population infected”. Now with the column onset_case marking whether the running total is higher than a threshold, we can identify the first occurrence of TRUE :
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
,onset_case = n_cases_cum > 0
,first_case = cumsum(onset_case) == 1L
)
d_reprex_timeline
# A tibble: 15 x 6
# Groups: country_code [3]
country_code date n_cases n_cases_cum onset_case first_case
<chr> <date> <dbl> <dbl> <lgl> <lgl>
1 Alabnia 2020-03-01 NA 0 FALSE FALSE
2 Alabnia 2020-03-02 0 0 FALSE FALSE
3 Alabnia 2020-03-03 1 1 TRUE TRUE
4 Alabnia 2020-03-04 0 1 TRUE FALSE
5 Alabnia 2020-03-05 3 4 TRUE FALSE
6 Botswana 2020-04-01 0 0 FALSE FALSE
7 Botswana 2020-04-02 NA 0 FALSE FALSE
8 Botswana 2020-04-03 2 2 TRUE TRUE
9 Botswana 2020-04-04 3 5 TRUE FALSE
10 Botswana 2020-04-05 0 5 TRUE FALSE
11 Chile 2020-05-01 2 2 TRUE TRUE
12 Chile 2020-05-02 0 2 TRUE FALSE
13 Chile 2020-05-03 0 2 TRUE FALSE
14 Chile 2020-05-04 3 5 TRUE FALSE
15 Chile 2020-05-05 1 6 TRUE FALSE
Notice, that we use the property of logical class: when used in mathematical expression, FALSE assumes the value of 0, while TRUE is interpreted as 1.
Now we can use this indicator to extract the date associated with this row:
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
,onset_case = n_cases_cum > 0
,first_case = cumsum(onset_case) == 1L
,date_of_1case1 = ifelse(first_case, date, NA) %>% lubridate::as_date()
)
d_reprex_timeline
# A tibble: 15 x 7
# Groups: country_code [3]
country_code date n_cases n_cases_cum onset_case first_case date_of_1case1
<chr> <date> <dbl> <dbl> <lgl> <lgl> <date>
1 Alabnia 2020-03-01 NA 0 FALSE FALSE NA
2 Alabnia 2020-03-02 0 0 FALSE FALSE NA
3 Alabnia 2020-03-03 1 1 TRUE TRUE 2020-03-03
4 Alabnia 2020-03-04 0 1 TRUE FALSE NA
5 Alabnia 2020-03-05 3 4 TRUE FALSE NA
6 Botswana 2020-04-01 0 0 FALSE FALSE NA
7 Botswana 2020-04-02 NA 0 FALSE FALSE NA
8 Botswana 2020-04-03 2 2 TRUE TRUE 2020-04-03
9 Botswana 2020-04-04 3 5 TRUE FALSE NA
10 Botswana 2020-04-05 0 5 TRUE FALSE NA
11 Chile 2020-05-01 2 2 TRUE TRUE 2020-05-01
12 Chile 2020-05-02 0 2 TRUE FALSE NA
13 Chile 2020-05-03 0 2 TRUE FALSE NA
14 Chile 2020-05-04 3 5 TRUE FALSE NA
15 Chile 2020-05-05 1 6 TRUE FALSE NA
There will be only one date associated with traspassing the threshold, so we populate the rest of the cells with it
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
,onset_case = n_cases_cum > 0
,first_case = cumsum(onset_case) == 1L
,date_of_1case1 = ifelse(first_case, date, NA) %>% lubridate::as_date()
,date_of_1case2 = min(date_of_1case1, na.rm = T)
)
d_reprex_timeline
# A tibble: 15 x 8
# Groups: country_code [3]
country_code date n_cases n_cases_cum onset_case first_case date_of_1case1 date_of_1case2
<chr> <date> <dbl> <dbl> <lgl> <lgl> <date> <date>
1 Alabnia 2020-03-01 NA 0 FALSE FALSE NA 2020-03-03
2 Alabnia 2020-03-02 0 0 FALSE FALSE NA 2020-03-03
3 Alabnia 2020-03-03 1 1 TRUE TRUE 2020-03-03 2020-03-03
4 Alabnia 2020-03-04 0 1 TRUE FALSE NA 2020-03-03
5 Alabnia 2020-03-05 3 4 TRUE FALSE NA 2020-03-03
6 Botswana 2020-04-01 0 0 FALSE FALSE NA 2020-04-03
7 Botswana 2020-04-02 NA 0 FALSE FALSE NA 2020-04-03
8 Botswana 2020-04-03 2 2 TRUE TRUE 2020-04-03 2020-04-03
9 Botswana 2020-04-04 3 5 TRUE FALSE NA 2020-04-03
10 Botswana 2020-04-05 0 5 TRUE FALSE NA 2020-04-03
11 Chile 2020-05-01 2 2 TRUE TRUE 2020-05-01 2020-05-01
12 Chile 2020-05-02 0 2 TRUE FALSE NA 2020-05-01
13 Chile 2020-05-03 0 2 TRUE FALSE NA 2020-05-01
14 Chile 2020-05-04 3 5 TRUE FALSE NA 2020-05-01
15 Chile 2020-05-05 1 6 TRUE FALSE NA 2020-05-01
This allows us for a very straightforward computation of the distance between any given date and the date of the “onset”, in this case the date of the first confirmed case:
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
#,onset_case = n_cases_cum > 0
# ,first_case = cumsum(onset_case) == 1L
# ,date_of_1case1 = ifelse(first_case, date, NA) %>% lubridate::as_date()
# ,date_of_1case2 = min(date_of_1case1, na.rm = T)
# alternatively, as a single sentence:
,date_of_1case = ifelse(cumsum(n_cases_cum > 0) == 1L, date, NA) %>%
min(na.rm=T) %>%
lubridate::as_date()
# relative timeline
,days_since_1case = (date - date_of_1case) %>% as.integer()
) %>%
ungroup() %>%
select(-date_of_1case) # because it is easily derived
d_reprex_timeline %>% neat()
| country_code | date | n_cases | n_cases_cum | days_since_1case |
|---|---|---|---|---|
| Alabnia | 2020-03-01 | NA | 0 | -2 |
| Alabnia | 2020-03-02 | 0 | 0 | -1 |
| Alabnia | 2020-03-03 | 1 | 1 | 0 |
| Alabnia | 2020-03-04 | 0 | 1 | 1 |
| Alabnia | 2020-03-05 | 3 | 4 | 2 |
| Botswana | 2020-04-01 | 0 | 0 | -2 |
| Botswana | 2020-04-02 | NA | 0 | -1 |
| Botswana | 2020-04-03 | 2 | 2 | 0 |
| Botswana | 2020-04-04 | 3 | 5 | 1 |
| Botswana | 2020-04-05 | 0 | 5 | 2 |
| Chile | 2020-05-01 | 2 | 2 | 0 |
| Chile | 2020-05-02 | 0 | 2 | 1 |
| Chile | 2020-05-03 | 0 | 2 | 2 |
| Chile | 2020-05-04 | 3 | 5 | 3 |
| Chile | 2020-05-05 | 1 | 6 | 4 |
Finally, we can re-express these steps more succinctly, however, it might be advisible to leave these step in comments in case you need to retrace your steps or debug an error down the stream
d_reprex_timeline <- d_reprex %>%
group_by(country_code) %>%
mutate(
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
#,onset_case = n_cases_cum > 0
# ,first_case = cumsum(onset_case) == 1L
# ,date_of_1case1 = ifelse(first_case, date, NA) %>% lubridate::as_date()
# ,date_of_1case2 = min(date_of_1case1, na.rm = T)
# alternatively, as a single sentence:
,date_of_1case = ifelse(cumsum(n_cases_cum > 0) == 1L, date, NA) %>%
min(na.rm=T) %>%
lubridate::as_date()
# relative timeline
,days_since_1case = (date - date_of_1case) %>% as.integer()
) %>%
ungroup() %>%
select(-date_of_1case) # because it is easily derived
d_reprex_timeline %>% neat()
| country_code | date | n_cases | n_cases_cum | days_since_1case |
|---|---|---|---|---|
| Alabnia | 2020-03-01 | NA | 0 | -2 |
| Alabnia | 2020-03-02 | 0 | 0 | -1 |
| Alabnia | 2020-03-03 | 1 | 1 | 0 |
| Alabnia | 2020-03-04 | 0 | 1 | 1 |
| Alabnia | 2020-03-05 | 3 | 4 | 2 |
| Botswana | 2020-04-01 | 0 | 0 | -2 |
| Botswana | 2020-04-02 | NA | 0 | -1 |
| Botswana | 2020-04-03 | 2 | 2 | 0 |
| Botswana | 2020-04-04 | 3 | 5 | 1 |
| Botswana | 2020-04-05 | 0 | 5 | 2 |
| Chile | 2020-05-01 | 2 | 2 | 0 |
| Chile | 2020-05-02 | 0 | 2 | 1 |
| Chile | 2020-05-03 | 0 | 2 | 2 |
| Chile | 2020-05-04 | 3 | 5 | 3 |
| Chile | 2020-05-05 | 1 | 6 | 4 |
ds_covid_timeline <- ds_covid %>%
group_by(country_code) %>%
mutate(
# compute timeline of cumulative confirmed cases
n_cases_cum = cumsum(tidyr::replace_na(n_cases,0))
,date_of_1case = ifelse(cumsum(n_cases_cum > 0) == 1L, date, NA) %>%
min(na.rm=T) %>%
lubridate::as_date()
,days_since_1case = (date - date_of_1case) %>% as.integer()
# compute timeine of cumulative deaths
,n_deaths_cum = cumsum(tidyr::replace_na(n_deaths,0))
,date_of_1death = ifelse(cumsum(n_deaths_cum > 0) == 1L, date, NA) %>%
min(na.rm=T) %>%
lubridate::as_date()
,days_since_1death = (date - date_of_1death) %>% as.integer()
# compute absolute timeline
,days_since_exodus = as.integer(date - date_of_exodus) # first case outside of china
,days_since_pandemic = as.integer(date - date_of_pandemic) # WHO declares pandemic
,n_deaths_cum_per_1m = as.integer(n_deaths_cum/n_population_2018*1000000)
,n_cases_cum_per_1m = as.integer(n_cases_cum/ n_population_2018*1000000)
) %>%
ungroup() %>%
select(-date_of_1case, -date_of_1death) %>%
select(
date, country_code, n_cases, n_deaths, n_cases_cum, n_deaths_cum,
days_since_1case, days_since_1death, days_since_exodus, days_since_pandemic,
dplyr::everything()
)
ds_covid_timeline %>% glimpse()
Rows: 31,365
Columns: 17
$ date <date> 2019-12-31, 2020-01-01, 2020-01-02, 2020-01-03, 2020-01-04, 2020-01-05, 2020-01-06, 20...
$ country_code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AF...
$ n_cases <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ n_deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ n_cases_cum <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ n_deaths_cum <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ days_since_1case <int> -56, -55, -54, -53, -52, -51, -50, -49, -48, -47, -46, -45, -44, -43, -42, -41, -40, -3...
$ days_since_1death <int> -84, -83, -82, -81, -80, -79, -78, -77, -76, -75, -74, -73, -72, -71, -70, -69, -68, -6...
$ days_since_exodus <int> -13, -12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1...
$ days_since_pandemic <int> -71, -70, -69, -68, -67, -66, -65, -64, -63, -62, -61, -60, -59, -58, -57, -56, -55, -5...
$ n_population_2018 <dbl> 37172386, 37172386, 37172386, 37172386, 37172386, 37172386, 37172386, 37172386, 3717238...
$ country_code2 <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF...
$ country_label <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan...
$ continent_label <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia",...
$ oecd <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
$ n_deaths_cum_per_1m <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ n_cases_cum_per_1m <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
g1 <- d1 %>%
ggplot(aes(
x = days_since_exodus
,y = n_cases_cum
))+
geom_line()+
facet_wrap(~country_label, scale = "free", ncol = 6)+
geom_point(
data = d1 %>% filter(days_since_1case == 1)
,size = 2, fill = "#1b9e77", color = "black", alpha = .5, shape = 21
)+
geom_point(
data = d1 %>% filter(days_since_1death == 1)
,size = 2, fill = "#d95f02", color = "black", alpha = .5, shape = 21
)+
geom_vline(xintercept = 58, linetype = "dotted",)+
geom_vline(xintercept = 75, linetype = "dashed", alpha = .5)+
geom_vline(xintercept = 100, linetype = "dashed", color = "red", alpha = .5)+
scale_x_continuous(breaks = seq(0,100, 50))+
labs(
title = "Timeline of COVID-19: Cumulative Cases"
, y = "Cumulative Cases (in thousands)", x = "Days since first case outside of China (Jan 13, 2020)"
, caption = "(first dot) = 1st confirmed case, (second dot) = 1st confirmed death,
(dotted line) = pandemic announced by WHO, (dashed lines) = 75 and 100th day since Exodus"
)
cat("\n## Cases\n")
g1
# cat("\n## Cases per 1m\n")
# g1 + aes(y = n_cases_cum_per_1m)+labs(y = "Cumulative Cases per 1 mil (in thousands)",
# title = "Timeline of COVID-19: Cumulative Cases per 1 million")
# cat("\n## Deaths\n")
# g1 + aes(y = n_deaths_cum)+labs(y = "Cumulative Deaths",
# title = "Timeline of COVID-19: Cumulative Deaths")
# cat("\n## Deaths per 1m\n")
# g1 + aes(y = n_deaths_cum_per_1m)+labs(y = "Cumulative Deaths per 1 mil",
# title = "Timeline of COVID-19: Cumulative Deaths per 1 million")
g3 <- ds_covid_timeline %>%
filter(oecd) %>%
filter(days_since_1case == 0) %>%
mutate(
country_label = forcats::fct_reorder(country_label, days_since_exodus)
,days_to_1death_since_exodus = (-1*days_since_1death) + days_since_exodus
) %>%
ggplot(aes(x = days_since_exodus, y = country_label))+
geom_segment(aes(yend = country_label, xend = 0), linetype=NA, alpha = .8)+
geom_segment(aes(yend = country_label, xend = days_to_1death_since_exodus, color = "red"))+
geom_point(shape = 21, size =2, alpha = .6, fill = "#1b9e77")+
geom_point(aes(x = days_to_1death_since_exodus), shape = 21, size =2, alpha = .6, fill = "#d95f02")+
geom_text(aes(label = country_code2, x = days_to_1death_since_exodus), hjust = -1, size = 3, color = "grey60")+
scale_x_continuous(breaks = seq(0,140, 20))+
guides(color = F)+
labs(
title = "COVID Timeline: Days to 1st Case and 1st Death"
,x = "Days to first case since exodus (January 13)", y = NULL
,caption = "(green dot) = 1st confirmed case, (orange dot) = 1st confirmed death"
)
g3
For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.
Environment
- Session info -------------------------------------------------------------------------------------------------------
setting value
version R version 3.6.3 (2020-02-29)
os Windows 10 x64
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2020-05-31
- Packages -----------------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.2)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
codetools 0.2-16 2018-12-24 [2] CRAN (R 3.6.3)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
config 0.3 2018-03-27 [1] CRAN (R 3.6.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.2)
crosstalk * 1.0.0 2016-12-21 [1] CRAN (R 3.6.2)
data.table 1.12.8 2019-12-09 [1] CRAN (R 3.6.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.2)
devtools 2.3.0 2020-04-10 [1] CRAN (R 3.6.3)
digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3)
dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.3)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.2)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.2)
fastmap 1.0.1 2019-10-08 [1] CRAN (R 3.6.2)
forcats 0.4.0 2019-02-17 [1] CRAN (R 3.6.2)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.2)
ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.2)
glue 1.4.0 2020-04-03 [1] CRAN (R 3.6.3)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.2)
hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.2)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 3.6.2)
httpuv 1.5.2 2019-09-11 [1] CRAN (R 3.6.2)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.2)
jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.2)
knitr * 1.28 2020-02-06 [1] CRAN (R 3.6.2)
later 1.0.0 2019-10-04 [1] CRAN (R 3.6.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
lubridate * 1.7.8 2020-04-06 [1] CRAN (R 3.6.3)
magrittr * 1.5 2014-11-22 [1] CRAN (R 3.6.2)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.2)
mime 0.9 2020-02-04 [1] CRAN (R 3.6.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.2)
pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.2)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.2)
plotly * 4.9.2 2020-02-12 [1] CRAN (R 3.6.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2)
promises 1.1.0 2019-10-04 [1] CRAN (R 3.6.2)
ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.2)
remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.2)
rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.3)
rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.2)
scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.2)
shiny 1.4.0 2019-10-10 [1] CRAN (R 3.6.2)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.2)
testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3)
tibble 3.0.1 2020-04-20 [1] CRAN (R 3.6.3)
tidyr 1.0.2 2020-01-24 [1] CRAN (R 3.6.2)
tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2)
usethis 1.6.0 2020-04-09 [1] CRAN (R 3.6.3)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.2)
vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.3)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.2)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.2)
xfun 0.12 2020-01-13 [1] CRAN (R 3.6.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.2)
yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
[1] C:/Users/an499583/Documents/R/win-library/3.6
[2] C:/Users/an499583/Documents/R/R-3.6.3/library